Sea Surface Temperature and Size Change Expectations
Code
library(gt)library(tidyverse)library(gmRi)library(tidyverse)library(ggeffects)library(zoo)library(lme4)library(mgcv)library(patchwork)library(emmeans)library(rstatix)library(ggpubr)theme_set(theme_gmri() +theme(plot.margin =margin()) )#--------------------## Set up#--------------------# Load the data df <-read_csv(here::here("data/size_and_spectra_model_data.csv"))# Get raw landingslandings_raw <-read_csv(here::here("data/unscaled_spectra_predictor_dataset_wide.csv")) %>%select(year, survey_area, landings_raw = landings)# Get the raw sst from oisstsst_raw =read_csv(here::here("data/unscaled_regional_sst.csv"))# Put it all together and clarify the column namesdf <- df %>%select(-sst) %>%left_join(landings_raw) %>%left_join(sst_raw) %>%mutate(survey_area =factor( survey_area, levels =c("Gulf of Maine", "Georges Bank", "Southern New England", "Mid-Atlantic Bight")),region =str_replace_all(survey_area, "-| ", "_"),region =factor( region, levels =c("Gulf_of_Maine", "Georges_Bank", "Mid_Atlantic_Bight", "Southern_New_England")))
The following tabs detail how I’m thinking about whether to use SST in celsius or Celsius as anomalies to test our hypotheses about temperature’s impact on size and size structure:
Raw SST should (and does) increase with lower latitudes in the Northern hemisphere.
The temperature size rule does not predict that the mean/median size of a community should be correlated to the ambient temperature. Though this may be the case, this is not what it contends with directly, or that is not my understanding of it.
My thinking is that each regional community is likely composed of individuals that on-average are living near temperatures that are neither significantly “colder” or “warmer” than their preference. If there are mean/median differences in size, then I don’t think it is a manifestation of the TSR.
Code
p1 <-ggplot(df) +geom_point(aes(sst, med_len, color = survey_area)) +geom_smooth(method ="lm", aes(sst, med_len)) +labs(x ="SST", y ="Median Length", title ="Median length is Negatively\nCorrelated With SST")p2 <-ggplot(df) +geom_point(aes(sst, med_wt, color = survey_area)) +geom_smooth(method ="lm", aes(sst, med_wt)) +labs(x ="SST", y ="Median Weight", title ="Median Weight is as Well")p3 <-ggplot(df) +geom_point(aes(sst, b, color = survey_area)) +geom_smooth(method ="lm", aes(sst, b)) +labs(x ="SST", y ="Size Spectra Slope", title ="Size Spectra Slope Also Appears to Be...\nBut it is unclear why it would.") p1 + p2 + p3 +guide_area() +plot_layout(ncol =2, guides ="collect")
Anomalies are based off a 30-year climatology for each respective region, and should capture departures from the long-term average. This helps out with a problem of modeling SST x Region interactions, where the distributions of sst for the regions do not overlap well.
The TSR contends that growth rates for species reared under higher temperatures show a plastic response where growth rates at early ages are elevated, and maturation happens earlier and at smaller sizes.
The temperature-size rule (TSR) is a widespread phenomenon, which describes the phenotypic plastic response of species’ size to temperature: individuals reared at colder temperatures mature as larger adults than at warmer temperatures.
This effect IMO is better seen through anomalies, and I also would not expect sst anomalies to operate differently between regions since they are region-specific.
Code
p1 <-ggplot(df) +geom_boxplot(aes(x = survey_area, y = sst, color = survey_area)) +labs(x ="Area", y ="OISST", title ="There are Different Baseline SSTs\nAmong Regions")p2 <-ggplot(df, aes(x = survey_area, y = sst_anom, color = survey_area)) +geom_boxplot() +labs(title ="Anomalies (or scaled values)\nFrame in terms of regional average", x ="Area", y ="OISST Anomaly")p3 <-ggplot(drop_na(df, sst), aes(year, sst_anom, color = survey_area)) +geom_point(alpha =0.4) +geom_smooth(se = F, linewidth =1) +labs(title ="All regions have elevated\nSST conditions in last decade", x ="Year", y ="OISST Anomalies")p4 <-ggplot(drop_na(df, ersst), aes(year, ersst_anom, color = survey_area)) +geom_point(alpha =0.4) +geom_smooth(se = F, linewidth =1) +labs(title ="Departure from long-term trend\nstarts in 2000's", x ="Year", y ="ERSST Anomalies")(p1 + p3) +plot_layout(guides ="collect")
Code
(p2 + p4) +plot_layout(guides ="collect")
These next plots explore the degree that SST anomalies are correlated with median size and spectra slope, and thoughts on what that relationship could/should be operating.
How do anomalies look with respect to these body-size metrics?
In a bit of a surprise to me, there does not seem to be any clear signal that any of these body-size related indicators are correlated with SSt anomalies.
Code
p1 <-ggplot(df, aes(sst_anom, med_len)) +geom_point(aes(color = survey_area)) +geom_smooth(method ="lm", color ="black") +#geom_smooth(method = "lm", aes(color = survey_area), se = F) +labs(x ="SST Anomaly", y ="Median Length")p2 <-ggplot(df, aes(sst_anom, med_wt)) +geom_point(aes(color = survey_area)) +geom_smooth(method ="lm", color ="black") +#geom_smooth(method = "lm", aes(color = survey_area), se = F) +labs(x ="SST Anomaly", y ="Median Weight")p3 <-ggplot(df, aes(sst_anom, b)) +geom_point(aes(color = survey_area)) +geom_smooth(method ="lm", color ="black") +#geom_smooth(method = "lm", aes(color = survey_area), se = F) +labs(x ="SST Anomaly", y ="Size Spectra Slope") p1 + p2 + p3 +guide_area() +plot_layout(ncol =2, guides ="collect") +plot_annotation(title ="SST anomalies show no global correlation, could be an interaction situation, could be nothing...")
Its possible that maybe the regions would have some differential response. However, I wouldn’t anticipate them to behave differently under expectations from the TSR.
It also does not appear that there is much evidence for that either…
Code
p1 <-ggplot(df, aes(sst_anom, med_len, color = survey_area)) +geom_point() +geom_smooth(method ="lm") +labs(x ="SST Anomaly", y ="Median Length")p2 <-ggplot(df, aes(sst_anom, med_wt, color = survey_area)) +geom_point() +geom_smooth(method ="lm") +labs(x ="SST Anomaly", y ="Median Weight")p3 <-ggplot(df, aes(sst_anom, b, color = survey_area)) +geom_point() +geom_smooth(method ="lm") +labs(x ="SST Anomaly", y ="Size Spectra Slope") p1 + p2 + p3 +guide_area() +plot_layout(ncol =2, guides ="collect") +plot_annotation(title ="SST Anomaly x Region Interaction")
I would almost expect SST anomalies to have a non-linear response shape of some dome-like shape mirroring a thermal preference curve.
This visual check isn’t very convincing of that though.
So we know that using SST itself has some problems:
1. SST from different regions have poor overlap
2. SST is colinear with the passage of time and with the declines in landings
From Bart:
Therefore, I think the best approach is to drop SST and focus on year as the predictor (and a proxy for SST).
My Thoughts:
Why would we want a proxy for SST when we have SST? Wouldn’t it be more prudent to make a statement on a lack of SST’s association with median size and spectra slope?
Including region and year brings in additional complications of collinearity. Each of the other variables shows trends through time. While it would be possible to follow a differencing procedure (e.g. remove the temporal trend) in these variables, I think a simpler and must easier approach is to just include year as both a predictor and a random effect, and remove all other covariates. What we are doing here is assuming that there is lots of stuff that differs with year, but we are going to push all that variation into the random effect component of the model in order to focus on the temporal trends by region.
My Thoughts:
I can see some of the rationale of going this route for the convenience of navigating the multicollineary of time+sst+landings.
To me, this option does not test either hypotheses for either landings or SST. I believe we’d have a decent model in terms of unbiased predictions and decent performance, but it seems like kind of a hands in the air for our original hypotheses and just focusing on whether there were changes over time.
Exploring years as a Random Effect
I’m also confused whether you can/should use year both as a fixed and a random. And also whether you are using it as a continuous variable for fixed and intended to use it as a categorical variable in the random effects term. I don’t think random effects can be continuous variables.
The model formula you had in your summary resembles what I would anticipate for having “year” as a continuous fixed effect.
Code
# Model summaries look the same if you enforce year as a factor for random effect or not# ggpredict plots do not.mod_regionXyear <-lmer(b ~ year*survey_area + (1|year), data = df)cont <- emmeans::emtrends(mod_regionXyear, pairwise ~ survey_area, var ="year")p1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(survey_area), x = year.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Mixed-Effects Model:\nRegion Fixed Effects")+theme_classic()p2 <-plot(ggpredict(mod_regionXyear, ~year+survey_area), add.data = T) +ggtitle("Model Formula:\nyear * region + (1 | year)")p1 / p2
This is what I imagine a random effect for year would produce.
Code
# Enforcing year as a factormod_regionXyear <-lmer(b ~ year*survey_area + (1|factor(year)), data = df)cont <- emmeans::emtrends(mod_regionXyear, pairwise ~ survey_area, var ="year")p1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(survey_area), x = year.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Mixed-Effects Model:\nRegion Fixed Effects")+theme_classic()p2 <-plot(ggpredict(mod_regionXyear, ~year+survey_area), add.data = T) +ggtitle("Model Formula:\nyear * region + (1 | factor(year))")p1 / p2
Or use much simpler model?
If we care about the change over time differences I think we’d want year as a fixed effect. And indeed the coefficent estimates are the same.
Code
simple_mod <-lm(b ~ year*survey_area, data = df)simple_cont <- emmeans::emtrends(simple_mod, pairwise ~ survey_area, var ="year")p1 <-as.data.frame(simple_cont$emtrends) %>%ggplot(aes(y =fct_rev(survey_area), x = year.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Fixed-Effects Only Model:\n Region Intercepts")+theme_classic()p2 <-plot(ggpredict(simple_mod, ~year+survey_area), add.data = T) +ggtitle("Model Formula:\nyear * region")p1 / p2
Proposing Alternative Random Effects Structure
I think there is some sense to have year as a R.E. It can help soak up some variance for things that happened to the four regions on any year that we don’t have measurements for (stratification dynamics, weather patterns, GSI, etc)
But I don’t know whether it should also be a fixed effect in the same model.
I don’t really care about whether regions have changed over time, our hypotheses are both focused on whether change can be associated with warming or fishing removals. So what if we swapped in sst?
Code
# New model:mod_regionXsst_anom <-lmer( b ~ sst_anom * survey_area + (1|year), data = df)# Get contrastscont <- emmeans::emtrends(mod_regionXsst_anom, pairwise ~ survey_area, var ="sst_anom")p1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(survey_area), x = sst_anom.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Mixed-Effects Model Formula::\nb ~ sst_anom * region + (1 | year)\nRegion Intercepts")+theme_classic()# Predictionsp2 <-plot(ggpredict(mod_regionXsst_anom, ~year+survey_area), add.data = T) +ggtitle("Year x Slope No Relationship")p3 <-plot(ggpredict(mod_regionXsst_anom, ~sst_anom+survey_area), add.data = T) +ggtitle("sst_anom x Region Interaction Seems Like Something Now")p1 / p2 / p3
What about landings?
Landings are the other covariate that we have a defined hypothesis for.
Landings are also collinear with year.
Landings also are highest for Georges Bank and Gulf of Maine before SST is available. So there is a data availability challenge for trying to work with the two covariates in addition to multicollinearity….
As far as a picking an informed model structure: - Similar value overlap issue that raw SST had, where the landings values span different ranges
- I would want to have a regional interaction, because the species targeted and the fishing means will be different, likely leading to different impacts
So if I were just doing landings I would probably start here:
harvest is not a strong selective force for smaller body sizes from 1970-2019 in the Northwestern Atlantic. With some potential interaction of importance in the Gulf of Maine.
Which could support the notion that:
Reductions in harvest are indeed weakening harvest induced declines in body size.
However, the anticipated potential increases in body size due to lower harvest are now masked by increases in temperature.
The data offers evidence that declines in body size in the fish community in the Northwest Atlantic are due to relative increases in regional sst since 1982.
Testing What we Care about:
Here is how I think I would proceed:
What do We Know:
Changes in median body length and size spectrum slope have occurred within different regions of the Northeast shelf.
These regions have experienced increases in average temperature.
Ecological literature would suggest that elevated temperatures could impact the size structure of the community via mechanisms related to TSR theory.
The Northeast US is also home to a number of large-scale commercial Fisheries. Fisheries also have been shown to impact the size structure of communities through removal of larger individuals and selective pressures.
For 3/4 of our regions these two covariates are colinear. Making it difficult to use either in a model together. They are also both colinear with year further complicating things.
I think there is a good case to make for regions to be treated as random effects. With any of the following reasons contributing to samples within the regions not being statistically independent, among other possibilities:
- differential species composition
- different mixing/productivity regimes
- different fisheries
I just know less what this means in terms of any our hypotheses.
From a hypothesis standpoint I would anticipate the following: 1. SSt anomalies should have a global non-linear impact effect that is stable across regions (no interaction with region)
- Landings could very likely differ between region due to different fisheries, and different community compositions (interaction with landing)
- Regions have different community structure, bathymetry, current dynamics, and connectedness to different sized nearshore habitats. I would expect each to vary (intercept)
Modeling without Year
From the above hypotheses, this is the linear model structure I would use. Year is not included as a term in these models.
# Linear model with our primary hypotheseslength_noyr <-lm(med_len ~ region * landings + sst_anom, data = df )summary(length_noyr)
Call:
lm(formula = med_len ~ region * landings + sst_anom, data = df)
Residuals:
Min 1Q Median 3Q Max
-14.8119 -2.4864 -0.3935 2.4744 12.3483
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.46099 0.72904 36.296 <2e-16 ***
regionGeorges_Bank -0.09738 1.02173 -0.095 0.9242
regionMid_Atlantic_Bight -10.48518 1.01415 -10.339 <2e-16 ***
regionSouthern_New_England -9.92448 1.00205 -9.904 <2e-16 ***
landings 1.56107 0.80875 1.930 0.0556 .
sst_anom -0.11020 0.61238 -0.180 0.8574
regionGeorges_Bank:landings -0.96418 1.03600 -0.931 0.3536
regionMid_Atlantic_Bight:landings -2.16327 1.04527 -2.070 0.0403 *
regionSouthern_New_England:landings -1.71460 0.99528 -1.723 0.0871 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.118 on 142 degrees of freedom
(49 observations deleted due to missingness)
Multiple R-squared: 0.6101, Adjusted R-squared: 0.5881
F-statistic: 27.77 on 8 and 142 DF, p-value: < 2.2e-16
Code
# Get contrastscont <- emmeans::emtrends(length_noyr, pairwise ~ region, var ="landings")# Plot contrastsp1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(region), x = landings.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Fixed-Effects lm Formula::\nmed_length ~ sst_anom + landings * region")+theme_classic()# Predictionsp2 <-plot(ggpredict(length_noyr, ~sst_anom), add.data = T) +labs(y ="Median Length", x ="SST Anomaly")p3 <-plot(ggpredict(length_noyr, ~landings*region), add.data = T) +labs(y ="Median Length", x ="Standardized Regional landings")p1 / p2 / p3
Code
# Augment with Model fit and residuals:df_aug <-augment(length_noyr, drop_na(df, sst_anom, landings)) %>%mutate(resid_col =ifelse(.resid >0, "darkred", "royalblue"))# Temporal Patterns in Residualsggplot(df_aug) +geom_hline(yintercept =0) +geom_line(aes(year, med_len, color ="Observed")) +geom_line(aes(year, .fitted, color ="Model Fit")) +facet_wrap(~region) +labs(y ="Residuals", title ="Model fits")
Code
# Temporal Patterns in Residualsggplot(df_aug) +geom_hline(yintercept =0) +geom_line(aes(year, .resid), color ="gray") +geom_point(aes(year, .resid, color =I(resid_col))) +facet_wrap(~region) +labs(y ="Residuals", title ="Temporal Patterns in Residuals")
# Linear model with our primary hypothesesb_noyr <-lm(b ~ region * landings + sst_anom, data = df )summary(b_noyr)
Call:
lm(formula = b ~ region * landings + sst_anom, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.160563 -0.036144 -0.007886 0.033354 0.139481
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.922175 0.009809 -94.013 < 2e-16 ***
regionGeorges_Bank -0.012645 0.013747 -0.920 0.3592
regionMid_Atlantic_Bight -0.112674 0.013645 -8.258 9.35e-14 ***
regionSouthern_New_England -0.117426 0.013482 -8.710 7.09e-15 ***
landings 0.017087 0.010881 1.570 0.1186
sst_anom -0.003155 0.008239 -0.383 0.7024
regionGeorges_Bank:landings -0.008689 0.013939 -0.623 0.5340
regionMid_Atlantic_Bight:landings -0.034807 0.014064 -2.475 0.0145 *
regionSouthern_New_England:landings -0.007334 0.013391 -0.548 0.5848
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.05541 on 142 degrees of freedom
(49 observations deleted due to missingness)
Multiple R-squared: 0.5207, Adjusted R-squared: 0.4937
F-statistic: 19.28 on 8 and 142 DF, p-value: < 2.2e-16
Code
# Get contrastscont <- emmeans::emtrends(b_noyr, pairwise ~ region, var ="landings")# Plot contrastsp1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(region), x = landings.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Fixed-Effects lm Formula::\nb ~ sst_anom + landings * region")+theme_classic()# Predictionsp2 <-plot(ggpredict(b_noyr, ~sst_anom), add.data = T) +labs(y ="Median Length", x ="SST Anomaly")p3 <-plot(ggpredict(b_noyr, ~landings*region), add.data = T) +labs(y ="Median Length", x ="Standardized Regional landings")p1 / p2 / p3
Code
# Augment with Model fit and residuals:df_aug <-augment(b_noyr, drop_na(df, sst_anom, landings)) %>%mutate(resid_col =ifelse(.resid >0, "darkred", "royalblue"))# Temporal Patterns in Residualsggplot(df_aug) +geom_hline(yintercept =0) +geom_line(aes(year, b, color ="Observed")) +geom_line(aes(year, .fitted, color ="Model Fit")) +facet_wrap(~region) +labs(y ="Residuals", title ="Model fits")
Code
# Temporal Patterns in Residualsggplot(df_aug) +geom_hline(yintercept =0) +geom_line(aes(year, .resid), color ="gray") +geom_point(aes(year, .resid, color =I(resid_col))) +facet_wrap(~region) +labs(y ="Residuals", title ="Temporal Patterns in Residuals")
Testing Change Over Time: Have Regions Seen Changes in Slope/Size
This would be preliminary/supplemental exploration of whether median size or size spectra slopes have changed within regions.
These would be straightforward linear models with a region interaction. They provide validity to statements of change over time and differences between regions, but do not speak to hypotheses on why those changes have occurred.
# Need a slope/intercept interaction between year and regionlength_change_lm <-lm(med_len ~ year * region, data = df)# # Test homegeneity of slopes - for ANCOVA# df %>% anova_test(med_len ~ region * year)# They aren't homogenous - pairwise test for differencesslopes_lst <-emtrends(length_change_lm, ~ region, var ="year")slopes_lst %>%as.data.frame() %>%gt() # slope estimates and CIs
#---- below might depend on slope assumption ------# ANOVA test for between significanceres.aov <- df %>%anova_test(med_len ~ year + region)get_anova_table(res.aov) %>%as.data.frame() %>%gt()
Effect
DFn
DFd
F
p
p<.05
ges
year
1
195
3.924
4.90e-02
*
0.020
region
3
195
97.198
1.65e-38
*
0.599
Code
# Pairwise comparisons# Get pairwise results for the factorspwc <- df %>%emmeans_test( med_len ~ region, covariate = year,p.adjust.method ="bonferroni") %>%add_xy_position(x ="region", fun ="mean_se")# Display the adjusted means of each group# Also called as the estimated marginal means (emmeans)get_emmeans(pwc) %>%ggplot(aes(emmean, fct_rev(region))) +geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +theme_gmri() +labs(x ="Median Length",y ="Region",title ="GB + GOM significantly Longer Lengths than MAB & SNE\n Not significantly different within the pairs")
Code
# Visualization: line plots with p-valuesget_emmeans(pwc) %>%left_join(select(df, survey_area, region)) %>%ggline(x ="survey_area", y ="emmean") +geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.2) +stat_pvalue_manual(pwc, hide.ns =TRUE, tip.length =FALSE) +labs(subtitle =get_test_label(res.aov, detailed =TRUE),caption =get_pwc_label(pwc)) +labs(y ="Median Length (cm)", x ="Region")
Code
# Need a slope/intercept interaction between year and regionb_change_lm <-lm(b ~ year * region, data = df)# Test homegeneity of slopes - for ANCOVAdf %>%anova_test(b ~ region * year)
Effect
DFn
DFd
F
p
p<.05
ges
region
3
192
81.815
0.0e+00
*
0.561
year
1
192
6.132
1.4e-02
*
0.031
region:year
3
192
10.293
2.6e-06
*
0.139
Code
# They aren't homogenous - pairwise test for differencesslopes_lst <-emtrends(b_change_lm, ~ region, var ="year")slopes_lst %>%as.data.frame() %>%gt() # slope estimates and CIs
#---- below might depend on slope assumption ------# ANOVA test for between significanceres.aov <- df %>%anova_test(b ~ year + region)get_anova_table(res.aov) %>%as.data.frame() %>%gt()
Effect
DFn
DFd
F
p
p<.05
ges
year
1
195
5.365
2.20e-02
*
0.027
region
3
195
71.582
2.94e-31
*
0.524
Code
# Pairwise comparisons# Get pairwise results for the factorspwc <- df %>%emmeans_test( b ~ region, covariate = year,p.adjust.method ="bonferroni") %>%add_xy_position(x ="region", fun ="mean_se")# Display the adjusted means of each group# Also called as the estimated marginal means (emmeans)get_emmeans(pwc) %>%ggplot(aes(emmean, fct_rev(region))) +geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +theme_gmri() +labs(x ="Exponent of Size Spectra (b)",y ="Region",title ="GB + GOM Spectra Slopes Shallower than MAB & SNE\n Not significantly different within the pairs")
Code
# Visualization: line plots with p-valuesget_emmeans(pwc) %>%left_join(select(df, survey_area, region)) %>%ggline(x ="survey_area", y ="emmean") +geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.2) +stat_pvalue_manual(pwc, hide.ns =TRUE, tip.length =FALSE) +labs(subtitle =get_test_label(res.aov, detailed =TRUE),caption =get_pwc_label(pwc)) +labs(y ="Exponent of Size Spectra (b)", x ="Region")
Source Code
---title: "Spectra_Model_EDA"author: name: "Adam A. Kemberling" title: "Quantitative Research Associate" url: https://github.com/adamkemberling affiliation: Gulf of Maine Research Institutedescription: | Exploration of Drivers and Model Formsdate: "`r Sys.Date()`"# format: docxformat: html: self-contained: true df-print: kable toc: true toc-depth: 2 code-fold: true code-tools: trueexecute: echo: true warning: false message: false fig.align: center---# Sea Surface Temperature and Size Change Expectations```{r}library(gt)library(tidyverse)library(gmRi)library(tidyverse)library(ggeffects)library(zoo)library(lme4)library(mgcv)library(patchwork)library(emmeans)library(rstatix)library(ggpubr)theme_set(theme_gmri() +theme(plot.margin =margin()) )#--------------------## Set up#--------------------# Load the data df <-read_csv(here::here("data/size_and_spectra_model_data.csv"))# Get raw landingslandings_raw <-read_csv(here::here("data/unscaled_spectra_predictor_dataset_wide.csv")) %>%select(year, survey_area, landings_raw = landings)# Get the raw sst from oisstsst_raw =read_csv(here::here("data/unscaled_regional_sst.csv"))# Put it all together and clarify the column namesdf <- df %>%select(-sst) %>%left_join(landings_raw) %>%left_join(sst_raw) %>%mutate(survey_area =factor( survey_area, levels =c("Gulf of Maine", "Georges Bank", "Southern New England", "Mid-Atlantic Bight")),region =str_replace_all(survey_area, "-| ", "_"),region =factor( region, levels =c("Gulf_of_Maine", "Georges_Bank", "Mid_Atlantic_Bight", "Southern_New_England")))```The following tabs detail how I'm thinking about whether to use SST in celsius or Celsius as anomalies to test our hypotheses about temperature's impact on size and size structure:::: panel-tabset### SST ActualRaw SST should (and does) increase with lower latitudes in the Northern hemisphere.The temperature size rule does not predict that the mean/median size of a community should be correlated to the ambient temperature. Though this may be the case, this is not what it contends with directly, or that is not my understanding of it.My thinking is that each regional community is likely composed of individuals that on-average are living near temperatures that are neither significantly "colder" or "warmer" than their preference. If there are mean/median differences in size, then I don't think it is a manifestation of the TSR.```{r}p1 <-ggplot(df) +geom_point(aes(sst, med_len, color = survey_area)) +geom_smooth(method ="lm", aes(sst, med_len)) +labs(x ="SST", y ="Median Length", title ="Median length is Negatively\nCorrelated With SST")p2 <-ggplot(df) +geom_point(aes(sst, med_wt, color = survey_area)) +geom_smooth(method ="lm", aes(sst, med_wt)) +labs(x ="SST", y ="Median Weight", title ="Median Weight is as Well")p3 <-ggplot(df) +geom_point(aes(sst, b, color = survey_area)) +geom_smooth(method ="lm", aes(sst, b)) +labs(x ="SST", y ="Size Spectra Slope", title ="Size Spectra Slope Also Appears to Be...\nBut it is unclear why it would.") p1 + p2 + p3 +guide_area() +plot_layout(ncol =2, guides ="collect")```### SST as AnomaliesAnomalies are based off a 30-year climatology for each respective region, and should capture departures from the long-term average. This helps out with a problem of modeling SST x Region interactions, where the distributions of sst for the regions do not overlap well. The TSR contends that growth rates for species reared under higher temperatures show a plastic response where growth rates at early ages are elevated, and maturation happens earlier and at smaller sizes.> The temperature-size rule (TSR) is a widespread phenomenon, which describes the phenotypic plastic response of species’ size to temperature: individuals reared at colder temperatures mature as larger adults than at warmer temperatures.This effect IMO is better seen through anomalies, and I also would not expect sst anomalies to operate differently between regions since they are region-specific.```{r}p1 <-ggplot(df) +geom_boxplot(aes(x = survey_area, y = sst, color = survey_area)) +labs(x ="Area", y ="OISST", title ="There are Different Baseline SSTs\nAmong Regions")p2 <-ggplot(df, aes(x = survey_area, y = sst_anom, color = survey_area)) +geom_boxplot() +labs(title ="Anomalies (or scaled values)\nFrame in terms of regional average", x ="Area", y ="OISST Anomaly")p3 <-ggplot(drop_na(df, sst), aes(year, sst_anom, color = survey_area)) +geom_point(alpha =0.4) +geom_smooth(se = F, linewidth =1) +labs(title ="All regions have elevated\nSST conditions in last decade", x ="Year", y ="OISST Anomalies")p4 <-ggplot(drop_na(df, ersst), aes(year, ersst_anom, color = survey_area)) +geom_point(alpha =0.4) +geom_smooth(se = F, linewidth =1) +labs(title ="Departure from long-term trend\nstarts in 2000's", x ="Year", y ="ERSST Anomalies")(p1 + p3) +plot_layout(guides ="collect")``````{r}(p2 + p4) +plot_layout(guides ="collect")```::::These next plots explore the degree that SST anomalies are correlated with median size and spectra slope, and thoughts on what that relationship could/should be operating.::: panel-tabset### SST Anomalies and Size StructureHow do anomalies look with respect to these body-size metrics?In a bit of a surprise to me, there does not seem to be any clear signal that any of these body-size related indicators are correlated with SSt anomalies.```{r}p1 <-ggplot(df, aes(sst_anom, med_len)) +geom_point(aes(color = survey_area)) +geom_smooth(method ="lm", color ="black") +#geom_smooth(method = "lm", aes(color = survey_area), se = F) +labs(x ="SST Anomaly", y ="Median Length")p2 <-ggplot(df, aes(sst_anom, med_wt)) +geom_point(aes(color = survey_area)) +geom_smooth(method ="lm", color ="black") +#geom_smooth(method = "lm", aes(color = survey_area), se = F) +labs(x ="SST Anomaly", y ="Median Weight")p3 <-ggplot(df, aes(sst_anom, b)) +geom_point(aes(color = survey_area)) +geom_smooth(method ="lm", color ="black") +#geom_smooth(method = "lm", aes(color = survey_area), se = F) +labs(x ="SST Anomaly", y ="Size Spectra Slope") p1 + p2 + p3 +guide_area() +plot_layout(ncol =2, guides ="collect") +plot_annotation(title ="SST anomalies show no global correlation, could be an interaction situation, could be nothing...")```### Region x SST Anomaly InteractionsIts possible that maybe the regions would have some differential response. However, I wouldn't anticipate them to behave differently under expectations from the TSR.It also does not appear that there is much evidence for that either...```{r}p1 <-ggplot(df, aes(sst_anom, med_len, color = survey_area)) +geom_point() +geom_smooth(method ="lm") +labs(x ="SST Anomaly", y ="Median Length")p2 <-ggplot(df, aes(sst_anom, med_wt, color = survey_area)) +geom_point() +geom_smooth(method ="lm") +labs(x ="SST Anomaly", y ="Median Weight")p3 <-ggplot(df, aes(sst_anom, b, color = survey_area)) +geom_point() +geom_smooth(method ="lm") +labs(x ="SST Anomaly", y ="Size Spectra Slope") p1 + p2 + p3 +guide_area() +plot_layout(ncol =2, guides ="collect") +plot_annotation(title ="SST Anomaly x Region Interaction")```### SST Anomalies as Non-Linear ResponseI would almost expect SST anomalies to have a non-linear response shape of some dome-like shape mirroring a thermal preference curve.This visual check isn't very convincing of that though.```{r}ggplot(df, aes(sst_anom, b, color = survey_area)) +geom_point(aes()) +geom_smooth(methog ="gam") +facet_wrap(~survey_area, ncol =2)```:::# Should we Drop SST/SST Anomalies?So we know that using SST itself has some problems: \ 1. SST from different regions have poor overlap \ 2. SST is colinear with the passage of time and with the declines in landings**From Bart:** \> Therefore, I think the best approach is to drop SST and focus on year as the predictor (and a proxy for SST).**My Thoughts:** \Why would we want a proxy for SST when we have SST? Wouldn't it be more prudent to make a statement on a lack of SST's association with median size and spectra slope?> Including region and year brings in additional complications of collinearity. Each of the other variables shows trends through time. While it would be possible to follow a differencing procedure (e.g. remove the temporal trend) in these variables, I think a simpler and must easier approach is to just include year as both a predictor and a random effect, and remove all other covariates. What we are doing here is assuming that there is lots of stuff that differs with year, but we are going to push all that variation into the random effect component of the model in order to focus on the temporal trends by region.**My Thoughts:** \I can see some of the rationale of going this route for the convenience of navigating the multicollineary of time+sst+landings. To me, this option does not test either hypotheses for either landings or SST. I believe we'd have a decent model in terms of unbiased predictions and decent performance, but it seems like kind of a hands in the air for our original hypotheses and just focusing on whether there were changes over time.### Exploring years as a Random EffectI'm also confused whether you can/should use year both as a fixed and a random. And also whether you are using it as a continuous variable for fixed and intended to use it as a categorical variable in the random effects term. I don't think random effects can be continuous variables.The model formula you had in your summary resembles what I would anticipate for having "year" as a continuous fixed effect.```{r include = T}# Model summaries look the same if you enforce year as a factor for random effect or not# ggpredict plots do not.mod_regionXyear <-lmer(b ~ year*survey_area + (1|year), data = df)cont <- emmeans::emtrends(mod_regionXyear, pairwise ~ survey_area, var ="year")p1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(survey_area), x = year.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Mixed-Effects Model:\nRegion Fixed Effects")+theme_classic()p2 <-plot(ggpredict(mod_regionXyear, ~year+survey_area), add.data = T) +ggtitle("Model Formula:\nyear * region + (1 | year)")p1 / p2```This is what I imagine a random effect for year would produce.```{r include = T}# Enforcing year as a factormod_regionXyear <-lmer(b ~ year*survey_area + (1|factor(year)), data = df)cont <- emmeans::emtrends(mod_regionXyear, pairwise ~ survey_area, var ="year")p1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(survey_area), x = year.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Mixed-Effects Model:\nRegion Fixed Effects")+theme_classic()p2 <-plot(ggpredict(mod_regionXyear, ~year+survey_area), add.data = T) +ggtitle("Model Formula:\nyear * region + (1 | factor(year))")p1 / p2```## Or use much simpler model?If we care about the change over time differences I think we'd want year as a fixed effect. And indeed the coefficent estimates are the same.```{r}simple_mod <-lm(b ~ year*survey_area, data = df)simple_cont <- emmeans::emtrends(simple_mod, pairwise ~ survey_area, var ="year")p1 <-as.data.frame(simple_cont$emtrends) %>%ggplot(aes(y =fct_rev(survey_area), x = year.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Fixed-Effects Only Model:\n Region Intercepts")+theme_classic()p2 <-plot(ggpredict(simple_mod, ~year+survey_area), add.data = T) +ggtitle("Model Formula:\nyear * region")p1 / p2```## Proposing Alternative Random Effects StructureI think there is some sense to have year as a R.E. It can help soak up some variance for things that happened to the four regions on any year that we don't have measurements for (stratification dynamics, weather patterns, GSI, etc)But I don't know whether it should also be a fixed effect in the same model.I don't really care about whether regions have changed over time, our hypotheses are both focused on whether change can be associated with warming or fishing removals. So what if we swapped in sst?```{r}# New model:mod_regionXsst_anom <-lmer( b ~ sst_anom * survey_area + (1|year), data = df)# Get contrastscont <- emmeans::emtrends(mod_regionXsst_anom, pairwise ~ survey_area, var ="sst_anom")p1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(survey_area), x = sst_anom.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Mixed-Effects Model Formula::\nb ~ sst_anom * region + (1 | year)\nRegion Intercepts")+theme_classic()# Predictionsp2 <-plot(ggpredict(mod_regionXsst_anom, ~year+survey_area), add.data = T) +ggtitle("Year x Slope No Relationship")p3 <-plot(ggpredict(mod_regionXsst_anom, ~sst_anom+survey_area), add.data = T) +ggtitle("sst_anom x Region Interaction Seems Like Something Now")p1 / p2 / p3```---# What about landings?Landings are the other covariate that we have a defined hypothesis for.Landings are also collinear with year.Landings also are highest for Georges Bank and Gulf of Maine before SST is available. So there is a data availability challenge for trying to work with the two covariates in addition to multicollinearity....As far as a picking an informed model structure: - Similar value overlap issue that raw SST had, where the landings values span different ranges\ - I would want to have a regional interaction, because the species targeted and the fishing means will be different, likely leading to different impacts\So if I were just doing landings I would probably start here:```{r}p1 <-ggplot(df, aes(year, landings, color = survey_area)) +geom_point() +geom_smooth(method ="gam") +labs(title ="Relative Regional Landings by Area")p2 <-ggplot(df, aes(landings, med_len, color = survey_area)) +geom_point() +geom_smooth(method ="gam") +labs(title ="median length ~ s(landings_scaled)")p3 <-ggplot(df, aes(landings, b, color = survey_area)) +geom_point() +geom_smooth(method ="gam") +labs(title ="b ~ s(landings_scaled)")p1 + p2 + p3 +guide_area() +plot_layout(ncol =2, guides ="collect")```This suggests to me that:harvest is not a strong selective force for smaller body sizes from 1970-2019 in the Northwestern Atlantic. With some potential interaction of importance in the Gulf of Maine.Which could support the notion that:Reductions in harvest are indeed weakening harvest induced declines in body size.However, the anticipated potential increases in body size due to lower harvest are now masked by increases in temperature.The data offers evidence that declines in body size in the fish community in the Northwest Atlantic are due to relative increases in regional sst since 1982.---# Testing What we Care about:Here is how I think I would proceed:### What do We Know:Changes in median body length and size spectrum slope have occurred within different regions of the Northeast shelf.These regions have experienced increases in average temperature.Ecological literature would suggest that elevated temperatures could impact the size structure of the community via mechanisms related to TSR theory.The Northeast US is also home to a number of large-scale commercial Fisheries. Fisheries also have been shown to impact the size structure of communities through removal of larger individuals and selective pressures.For 3/4 of our regions these two covariates are colinear. Making it difficult to use either in a model together. They are also both colinear with year further complicating things.```{r}library(ggpmisc)p1 <-ggplot(df, aes(sst_anom, landings)) +geom_point() +geom_smooth(method ="lm") +stat_poly_eq() +scale_y_continuous(expand =expansion(add =c(0,2))) +facet_wrap(~survey_area)p2 <-ggplot(df, aes(year, sst_anom)) +geom_point() +geom_smooth(method ="lm") +stat_poly_eq() +scale_y_continuous(expand =expansion(add =c(0,2))) +facet_wrap(~survey_area)p3 <-ggplot(df, aes(year, landings)) +geom_point() +geom_smooth(method ="lm") +stat_poly_eq() +scale_y_continuous(expand =expansion(add =c(0,2))) +facet_wrap(~survey_area)p1 + p2 + p3 +plot_layout(ncol =2) +plot_annotation(title ="Three-way colinearity")```### Should we even use RE?I think there is a good case to make for regions to be treated as random effects. With any of the following reasons contributing to samples within the regions not being statistically independent, among other possibilities:\ - differential species composition\ - different mixing/productivity regimes\ - different fisheries\I just know less what this means in terms of any our hypotheses.```{r}length_noyr_mm <-lmer(med_len ~ (1| survey_area) * landings + sst_anom, data = df )```# Moving Forward: Hypothesis Based Model StructuresFrom a hypothesis standpoint I would anticipate the following: 1. SSt anomalies should have a global non-linear impact effect that is stable across regions (no interaction with region)\ - Landings could very likely differ between region due to different fisheries, and different community compositions (interaction with landing) \ - Regions have different community structure, bathymetry, current dynamics, and connectedness to different sized nearshore habitats. I would expect each to vary (intercept)### Modeling without YearFrom the above hypotheses, this is the linear model structure I would use. Year is not included as a term in these models. ### Median Length Model:::: {.panel-tabset}#### Model and Summary```{r}# Linear model with our primary hypotheseslength_noyr <-lm(med_len ~ region * landings + sst_anom, data = df )summary(length_noyr)```#### Partial Dependence Plots```{r}# Get contrastscont <- emmeans::emtrends(length_noyr, pairwise ~ region, var ="landings")# Plot contrastsp1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(region), x = landings.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Fixed-Effects lm Formula::\nmed_length ~ sst_anom + landings * region")+theme_classic()# Predictionsp2 <-plot(ggpredict(length_noyr, ~sst_anom), add.data = T) +labs(y ="Median Length", x ="SST Anomaly")p3 <-plot(ggpredict(length_noyr, ~landings*region), add.data = T) +labs(y ="Median Length", x ="Standardized Regional landings")p1 / p2 / p3```#### Residuals```{r}# Augment with Model fit and residuals:df_aug <-augment(length_noyr, drop_na(df, sst_anom, landings)) %>%mutate(resid_col =ifelse(.resid >0, "darkred", "royalblue"))# Temporal Patterns in Residualsggplot(df_aug) +geom_hline(yintercept =0) +geom_line(aes(year, med_len, color ="Observed")) +geom_line(aes(year, .fitted, color ="Model Fit")) +facet_wrap(~region) +labs(y ="Residuals", title ="Model fits")# Temporal Patterns in Residualsggplot(df_aug) +geom_hline(yintercept =0) +geom_line(aes(year, .resid), color ="gray") +geom_point(aes(year, .resid, color =I(resid_col))) +facet_wrap(~region) +labs(y ="Residuals", title ="Temporal Patterns in Residuals")```:::### Spectra Slope Model:::: {.panel-tabset}#### Model and Summary```{r}# Linear model with our primary hypothesesb_noyr <-lm(b ~ region * landings + sst_anom, data = df )summary(b_noyr)```#### Partial Dependence Plots```{r}# Get contrastscont <- emmeans::emtrends(b_noyr, pairwise ~ region, var ="landings")# Plot contrastsp1 <-as.data.frame(cont$emtrends) %>%ggplot(aes(y =fct_rev(region), x = landings.trend))+geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+geom_vline(xintercept =0, linetype =4)+labs(x ="Coefficient value", y ="", title ="Fixed-Effects lm Formula::\nb ~ sst_anom + landings * region")+theme_classic()# Predictionsp2 <-plot(ggpredict(b_noyr, ~sst_anom), add.data = T) +labs(y ="Median Length", x ="SST Anomaly")p3 <-plot(ggpredict(b_noyr, ~landings*region), add.data = T) +labs(y ="Median Length", x ="Standardized Regional landings")p1 / p2 / p3```#### Residuals```{r}# Augment with Model fit and residuals:df_aug <-augment(b_noyr, drop_na(df, sst_anom, landings)) %>%mutate(resid_col =ifelse(.resid >0, "darkred", "royalblue"))# Temporal Patterns in Residualsggplot(df_aug) +geom_hline(yintercept =0) +geom_line(aes(year, b, color ="Observed")) +geom_line(aes(year, .fitted, color ="Model Fit")) +facet_wrap(~region) +labs(y ="Residuals", title ="Model fits")# Temporal Patterns in Residualsggplot(df_aug) +geom_hline(yintercept =0) +geom_line(aes(year, .resid), color ="gray") +geom_point(aes(year, .resid, color =I(resid_col))) +facet_wrap(~region) +labs(y ="Residuals", title ="Temporal Patterns in Residuals")```:::---# Testing Change Over Time: Have Regions Seen Changes in Slope/SizeThis would be preliminary/supplemental exploration of whether median size or size spectra slopes have changed within regions.These would be straightforward linear models with a region interaction. They provide validity to statements of change over time and differences between regions, but do not speak to hypotheses on why those changes have occurred.::: {.panel-tabset}### Median Length Changes```{r}# Need a slope/intercept interaction between year and regionlength_change_lm <-lm(med_len ~ year * region, data = df)# # Test homegeneity of slopes - for ANCOVA# df %>% anova_test(med_len ~ region * year)# They aren't homogenous - pairwise test for differencesslopes_lst <-emtrends(length_change_lm, ~ region, var ="year")slopes_lst %>%as.data.frame() %>%gt() # slope estimates and CIspairs(slopes_lst) %>%as.data.frame() %>%gt() # comparisons#---- below might depend on slope assumption ------# ANOVA test for between significanceres.aov <- df %>%anova_test(med_len ~ year + region)get_anova_table(res.aov) %>%as.data.frame() %>%gt()# Pairwise comparisons# Get pairwise results for the factorspwc <- df %>%emmeans_test( med_len ~ region, covariate = year,p.adjust.method ="bonferroni") %>%add_xy_position(x ="region", fun ="mean_se")# Display the adjusted means of each group# Also called as the estimated marginal means (emmeans)get_emmeans(pwc) %>%ggplot(aes(emmean, fct_rev(region))) +geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +theme_gmri() +labs(x ="Median Length",y ="Region",title ="GB + GOM significantly Longer Lengths than MAB & SNE\n Not significantly different within the pairs")``````{r}# Visualization: line plots with p-valuesget_emmeans(pwc) %>%left_join(select(df, survey_area, region)) %>%ggline(x ="survey_area", y ="emmean") +geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.2) +stat_pvalue_manual(pwc, hide.ns =TRUE, tip.length =FALSE) +labs(subtitle =get_test_label(res.aov, detailed =TRUE),caption =get_pwc_label(pwc)) +labs(y ="Median Length (cm)", x ="Region")```### Spectra Slope Changes```{r}# Need a slope/intercept interaction between year and regionb_change_lm <-lm(b ~ year * region, data = df)# Test homegeneity of slopes - for ANCOVAdf %>%anova_test(b ~ region * year)# They aren't homogenous - pairwise test for differencesslopes_lst <-emtrends(b_change_lm, ~ region, var ="year")slopes_lst %>%as.data.frame() %>%gt() # slope estimates and CIspairs(slopes_lst) %>%as.data.frame() %>%gt() # comparisons#---- below might depend on slope assumption ------# ANOVA test for between significanceres.aov <- df %>%anova_test(b ~ year + region)get_anova_table(res.aov) %>%as.data.frame() %>%gt()# Pairwise comparisons# Get pairwise results for the factorspwc <- df %>%emmeans_test( b ~ region, covariate = year,p.adjust.method ="bonferroni") %>%add_xy_position(x ="region", fun ="mean_se")# Display the adjusted means of each group# Also called as the estimated marginal means (emmeans)get_emmeans(pwc) %>%ggplot(aes(emmean, fct_rev(region))) +geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +theme_gmri() +labs(x ="Exponent of Size Spectra (b)",y ="Region",title ="GB + GOM Spectra Slopes Shallower than MAB & SNE\n Not significantly different within the pairs")``````{r}# Visualization: line plots with p-valuesget_emmeans(pwc) %>%left_join(select(df, survey_area, region)) %>%ggline(x ="survey_area", y ="emmean") +geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.2) +stat_pvalue_manual(pwc, hide.ns =TRUE, tip.length =FALSE) +labs(subtitle =get_test_label(res.aov, detailed =TRUE),caption =get_pwc_label(pwc)) +labs(y ="Exponent of Size Spectra (b)", x ="Region")```:::